CycPeptMP: enhancing membrane permeability prediction of cyclic peptides with multi-level molecular features and data augmentation

Abstract Cyclic peptides are versatile therapeutic agents that boast high binding affinity, minimal toxicity, and the potential to engage challenging protein targets. However, the pharmaceutical utility of cyclic peptides is limited by their low membrane permeability—an essential indicator of oral bioavailability and intracellular targeting. Current machine learning-based models of cyclic peptide permeability show variable performance owing to the limitations of experimental data. Furthermore, these methods use features derived from the whole molecule that have traditionally been used to predict small molecules and ignore the unique structural properties of cyclic peptides. This study presents CycPeptMP: an accurate and efficient method to predict cyclic peptide membrane permeability. We designed features for cyclic peptides at the atom-, monomer-, and peptide-levels and seamlessly integrated these into a fusion model using deep learning technology. Additionally, we applied various data augmentation techniques to enhance model training efficiency using the latest data. The fusion model exhibited excellent prediction performance for the logarithm of permeability, with a mean absolute error of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $0.355$\end{document} and correlation coefficient of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $0.883$\end{document}. Ablation studies demonstrated that all feature levels contributed and were relatively essential to predicting membrane permeability, confirming the effectiveness of augmentation to improve prediction accuracy. A comparison with a molecular dynamics-based method showed that CycPeptMP accurately predicted peptide permeability, which is otherwise difficult to predict using simulations.


Introduction
Insulin was the first synthesized therapeutic peptide in 1921.Subsequently, peptides were frequently studied as they combine the advantages of small molecule and antibody drugs.The growing advancements in genetic engineering, peptide synthesis technologies, and sequence analysis tools have led to the development of new classes of peptide therapeutics for various applications [1][2][3][4].Meanwhile, many computational methods have been developed to efficiently predict the properties of linear peptides [5][6][7].However, certain limitations of conventional linear peptides, such as low stability, selectivity, and cell membrane permeability remain unresolved [8,9].In contrast to linear peptides, the unique structural features of macrocyclic peptides stem from their restricted conformational f lexibility and local secondary structure motifs, allowing for bioactive conformations with remarkable potency and selectivity [10,11].Therefore, the value of cyclic peptides is increasing in pharmaceutical research due to their high binding affinity, stability, target selectivity, and ability to inhibit intracellular protein-protein interactions [12][13][14].In the past century, cyclic peptide drugs were predominantly sourced from natural products, including antimicrobial agents and human peptide hormones.Recent advances in novel synthesis and screening systems have led to breakthroughs in cyclic peptide drug discovery [15,16].For example, the random nonstandard peptides integrated discovery (RaPID) system designs cyclic peptides from a diverse library, including non-natural amino acids, enabling the synthesis and rapid selection of potent binders for a wide range of therapeutic targets [17].The RaPID system has designed novel cyclic peptides for complex therapeutic targets, including a high-affinity binder to the osteoporosis target PlexinB1 [18], an inhibitor of the ubiquitin-protein ligase E6AP [17], and a selective inhibitor of the oncogenic K-Ras [19].Over 40 cyclic peptide drugs are currently in clinical use, with the FDA having approved approximately one macrocyclic peptide drug per year for the past 20 years [16,20].Despite their pharmacological potential, cyclic peptides often exhibit poor membrane permeability, severely limiting their biological applications and development of orally available drugs [9].The mechanism underlying membrane permeation by cyclic peptides remains unclear; however, cyclic peptides with a "closed"-conformation in hydrophobic environments often exhibit enhanced permeability [21,22].The "closed"conformation conceals polar groups through intramolecular hydrogen bonds and lipophilic side chains, contributing to their increased permeation efficiency.Drawing inspiration from the structure of cyclosporin A, a naturally occurring N-methylated macrocyclic peptide with high permeability, shielding of the exposed hydrogen bond donor (-NH) through N-methylation has been widely employed to enhance membrane permeability [23,24].Various strategies, such as conformational control [12,25], amide-to-ester substitution [26], amide-to-thioamide substitution [10], and side-chain modifications [27] have emerged for improving membrane permeability.However, these strategies do not improve membrane permeability across all cyclic peptides.
Selection of candidate compounds with high membrane permeability is important during the early stages of drug development.Thus, due to the cost associated with randomly measuring the permeability of numerous peptides using biochemical assays, the development of a rapid computational method that enables the prediction of membrane permeability is eagerly anticipated.Computational approaches to predict the permeability of cyclic peptides have been primarily based on molecular dynamics (MD) simulation [28][29][30][31].Markov state models were used to analyse simulation data and elucidate cyclic peptide behavior [32], which is crucial to understanding membrane permeation mechanisms and optimizing structures to enhance membrane permeability.However, the computational cost of MD-based methods is a major limitation.In contrast to MD-based methods, several physicochemical or machine learning models have been developed, offering more rapid prediction capabilities [33][34][35][36][37].The descriptors for hydrophobicity, such as the octanol-water partition coefficient (LogP), are generally the most important features for prediction.However, unlike the property prediction methods for small molecules and linear peptides, which have considerable experimental data, these models were established using limited data sets (10-250 cyclic peptides) and, thus, lack a sufficient degree of generalization performance.Furthermore, these methods directly apply whole-molecule features, typically used to predict the membrane permeability of small molecules while ignoring the unique structural characteristics of cyclic peptides, such as sequence information and circularity.
Unlike conventional physicochemical and machine learning approaches, deep learning (DL) models offer an architectural design tailored to peptide characteristics [38] and can automatically extract more complex structural features than smallmolecule compounds from datasets.DL-based small molecule property prediction methods based on graph neural networks (GNNs) and transformers have become a major research area.By representing atoms as nodes and bonds as edges, the GNN-based method can effectively capture molecular structural information and integrate the topological structure of molecules with complex atomic features.Nonetheless, most existing approaches, such as GCN [39,40], GAT [41,42], and MPNN [43] have intrinsic limitations, including a poor ability to process global information and risk of over-smoothing when many atoms are present.In contrast, many transformer-based methods have been proposed that treat SMILES as strings following the successful experience in natural language processing [44][45][46].Since these methods lack structural information, several methods have been developed using molecular graph representations as input that can encode more complex atom and bond information than strings [47][48][49][50].However, cyclic peptide permeability datasets are limited and diverse, with discrepancies and errors stemming from the use of various assay systems.To address these limitations, a comprehensive database of cyclic peptide membrane permeability was constructed, called CycPeptMPDB [51].CycPeptMPDB comprises information on 7334 cyclic peptides, including structures and experimentally measured membrane permeabilities, from 45 published studies and 2 patents from pharmaceutical companies.It represents the first platform for developing DL-based prediction methods.Interestingly, over 99.6% of cyclic peptides include non-natural amino acids, suggesting that they were created to enhance permeability through chemical modifications, such as N-methylation, or by deliberately incorporating non-natural building blocks in their design.In addition to the experimental data, the database contains relevant supporting information, such as 3D conformations and hierarchical editing language for macromolecules (HELM) sequence representations, which are composed of uniquely defined monomers (substructures, such as residues).This allows users to analyse data for various applications.
Cyclic peptides are characterized by complex conformational dynamics, where even a minor alteration in a single residue can lead to substantial changes in their membrane permeability [52].Therefore, many publications in CycPeptMPDB have focused on measuring changes in membrane permeability while only varying a few residues and maintaining a largely constant sequence.The combined use of multi-scale molecular features can improve the accuracy of predicting small molecule properties [49,53] and peptide-protein binding [54].This study proposes CycPeptMP: a membrane permeability prediction model for cyclic peptides that effectively integrates multi-level features with state-of-the-art DL techniques.We engineered features at the atom, monomer, and peptide levels to concurrently capture the local sequence variations and global conformational changes in cyclic peptides.Additionally, we employed data augmentation methods at the atom, monomer, and peptide levels to enhance the training efficiency of our model for complex cyclic peptides.

Experimental dataset
We used the structure and logarithm of experimentally determined membrane permeability (LogP exp ) of peptides in CycPeptM-PDB.CycPeptMPDB contains permeability data based on the parallel artificial membrane permeability (PAMPA), Caco-2, Madin-Darby canine kidney (MDCK), and Ralph Russ canine kidney (RRCK) assays.We selected PAMPA entries with the largest number of data points.The value recorded in the latest publication was used if the same peptide was measured in multiple publications.Consequently, 6889 peptides were selected, covering a relatively wide range of molecular weights, from 342.44 to 1777.74.Considering that the lower limit of LogP exp in CycPeptMPDB was −10 (1 × 10 −10 cm/s, 240 peptides), but the detection limit in most publications was −8 (1 × 10 −8 cm/s), we rounded the permeabilities of 314 peptides with values lower than −8 to −8.Similarly, the permeability of one peptide with a value higher than −4 was rounded to −4.
The validation and test sets were extracted from the overall data for model evaluation.First, the Kennard-Stone (KS) algorithm was employed to extract 5% of all data (344 peptides) as the test set, which should uniformly cover the multidimensional space [55].We generated 2048-bit Morgan fingerprints (Morgan FP, radius: 2) and selected the test set so that the Euclidean distance between each data point was maximized by the KS algorithm.From the remaining data, we randomly extracted 5% validation sets (344 peptides) three times for parameter tuning, with no overlap between the three datasets.The membrane permeability and molecular weight distributions for each set are shown in Fig. 1.The average mean absolute error (MAE), mean squared error (MSE), correlation coefficient (R), and coefficient of determination (R 2 ) from three repeated runs were used as evaluation metrics.

Overview of CycPeptMP framework
Figure 2 shows the overall architecture of the CycPeptMP model.We designed three-level representations of peptides and used each for three different sub-models to extract the atom-, monomer-, and peptide-level molecular representations.Initially, the input peptide was divided into monomers, and respective 3D conformations were generated from the peptide and monomers.Subsequently, atom-and peptide-level features were extracted from the peptide conformation and used as input for the atom and peptide models, respectively.Monomer-level features extracted from the monomer conformation were used as inputs for the monomer model.Finally, the three-level latent feature vectors extracted using the three sub-models were combined to derive the membrane permeability prediction values.

Division of the monomer
Unlike small molecules, most cyclic peptides are composed of a combination of monomers that are the standard building blocks in their chemical synthesis.CycPeptMPDB also provides a monomer-level sequence representation because most membrane permeability studies of cyclic peptides performed modifications at the monomer level.Therefore, we designed monomer-level features to accurately capture the subtle sequence variations of cyclic peptides.The adopted definition of a monomer corresponded to that provided by CycPeptMPDB.Although the peptide and ester bonds on the side chain were cleaved in CycPeptMPDB, bonds existing anywhere other than the macrocycle were not subjected to division to fully express the properties of the local structure.Merely hydrolysing the peptide bond could generate a new hydrogen bond donor, potentially misrepresenting the substructure's original physicochemical properties.Hence, an appropriate capping is required when decomposing peptides into monomers.When generating the conformation and calculating the monomer descriptor, the cleaved amide group or O atom of the amide-to-ester substitution was methylated (addition of CH3), and the carboxyl group was converted to an aldehyde (addition of H).

Peptide and monomer descriptors
To design peptide-and monomer-level features, a whole peptide and each of its monomers were represented by 16 descriptors, respectively, including LogP and polar surface area (Table 1; the correlation matrix of the selected descriptors is shown in Supplementary Fig. S1).Initially, the 3D conformations of peptides and monomers were generated using the RDKit package (version 2022.09.5) [56].The initial structures were generated using the ETKDG method and then optimized by energy minimization with the UFF force field.Subsequently, the 2D and 3D descriptors were calculated using a singleconformation 3D structure.The 16 descriptors were selected as follows: first, a total of 1857 descriptors (1689 2D and 168 3D descriptors) were calculated for whole cyclic peptides (peptide descriptors) and all monomers (monomer descriptors) using MOE software (version 2019.01)[57], the Mordred package (version 1.2) [58], and RDKit package.For 2D and 3D peptide descriptors, we removed all descriptors with constant values among cyclic peptides within the dataset.For descriptor pairs with an absolute correlation coefficient of 0.9 or more, the one with the lower correlation with permeability was excluded.Consequently, 407 (335 2D and 72 3D descriptors) peptide descriptors were selected.Performing further feature selection and using only important features can reduce overfitting and improve model interpretability.Random forest (RF) is commonly used as an algorithm for robust feature selections, even with many variables.It can provide quantitative measures of the importance of each variable in prediction.Therefore, we constructed two RF models with the 2D or 3D peptide descriptors.Subsequently, seven 2D and nine 3D peptide descriptors were selected based on the assigned feature importance (Supplementary Fig. S2).Finally, the same 16 monomer descriptors were selected, and peptide and monomer descriptors were standardized based on the Z-score.

Atom model
We designed atom-level features so that the atom model could capture minor changes, such as enantiomers, by node features (Node) and global changes in the entire molecule by three types of node-pair relative relationship matrices (Bond, Graph, and Conf ).As shown in Table 2, heavy atoms were considered nodes, and node features were represented as Node, bonded interaction weights were represented as Bond ∈ R Natoms×Natoms , graphic pairwise distances were represented as Graph ∈ R Natoms×Natoms , and 3D pairwise distances were represented as Conf ∈ R Natoms×Natoms .
Since the transformer can effectively learn relationships between distant atoms even when the number of atoms is large (the maximum number of heavy atoms in the experimental data was 128) [59], we constructed a transformer-based atom model to capture the overall graph structure and 3D conformation of the peptide (Fig. 3 (A)).Bond recorded the molecule bond information and controlled message propagation between neighboring nodes by assigning weights to each bond type.According to the chemical bonding principle, bonds with more electron participation (such as unsaturated bonds) were assigned higher weights to enhance the exchange of information between atoms [48].Meanwhile, many transformer-based models record the positional relationships between nodes or tokens using traditional absolute positional encoding [45,60].However, some studies have reported that relative positional encoding can improve prediction accuracy [47,59].To capture the local relationship between each node, embedded Bond was used for positional encoding and added to ) where Graph i,j and Conf i,j are the distance calculated between atom pairs i and j from the graph representation and 3D conformation.Furthermore, we designed a structure-enhanced transformer encoder to learn the structural and 3D conformational information of peptides using focused attention.The encoder comprised two blocks, one using Strength graph and another using Strength conf .This approach attenuates attention between less relevant pairs based on the distance, providing a simplified approach to modeling complex molecular structures as follows (the case of graph block): ) ) x graph l = LayerNorm(residual where x graph l−1 and x graph l are the updated latent features of the graph block in (l − 1)-th and l-th layers, respectively, h is the head number of multi-head attention, and , and W O ∈ R d model ×d model are trainable parameters.In the case of the conf block, x conf l can calculated from x conf l−1 and Strength conf in the same process as the graph block.
Finally, the outputs x graph out and x conf out of the two blocks were weighted using the hyperparameter λ g and the concatenated feature vector was used to derive the final output out atom of the atom model as follows:

Monomer model
We constructed a monomer model based on the 16 types of monomer descriptors to capture the partial structural information of peptides at the sequence level (Fig. 3 (B)).The CNN was used to learn partial structural features and sequence information.For the convolution layer, we used the general 1D-CNN layer or a CyclicConv layer [38] that considers peptide circularity.The use of the CyclicConv or 1D-CNN layer was determined by hyperparameter tuning.Finally, the monomer model derived the latent feature out monomer .

Peptide model
To capture the characteristics of the entire molecule, we used 16 peptide descriptors representing physicochemical properties and 2048-bit Morgan FP (1024-bit, radius: 2; 1024-bit, radius: 3) representing substructural information as peptide-level features.
The descriptor and Morgan FP were each trained with different multilayer perceptrons (MLPs) and the latent feature vectors x desc out and x fp out were concatenated and used to derive the final output out peptide of the peptide model as follows: out peptide = Linear(Concat(x desc out , x fp out )).

Fusion model
As shown in Fig. 2, the output latent feature vectors out atom , out monomer , and out peptide of the three sub-models were concatenated to generate the final molecular feature vector, which was passed through a shared layer for the final permeability prediction out fusion .As the model becomes more complex, problems such as gradient disappearance may occur, causing input information to not be transmitted.Auxiliary loss is a learning technique in which additional losses are incurred to optimize the NN learning process.Directly propagating errors to the middle network layer can prevent gradient disappearance and improve embedding and learning efficiency [61].Hence, we designed the three sub-model losses L atom , L monomer , and L peptide derived from the output of each sub-model (the definition of L monomer is shown in Equation (6a)), and layer losses L layer a , L layer m , and L layer p derived from the averaged outputs of the layers in each block (Transformer, CNN, and MLP) of the three sub-models (the definition of L layer m is shown in Equation (6b)) in addition to the main loss L fusion calculated from the output of the fusion model: L layer m = Lossfunc(Linear(Mean(x mono 1 , ..., The loss function during training is presented in Equation ( 7): where the weight parameter γ sub was set to 0.10 and γ layer was set to 0.05.Only the output value out fusion of the fusion model was used during inference.The hyperparameters of CycPeptMP were determined by 150 trials using Optuna software [62] based on the average RMSE of three runs; the search range and results are shown in Supplementary Table S1, and hyperparameters with a significant impact are shown in Supplementary Fig. S3.

Data augmentation
Although the amount of available biological data has increased, experimental data remains limited compared to data for natural language processing and computer vision.For example, the Tox21 dataset deals with the toxicity classification of small molecules and has only approximately 8000 data points.This limitation in biological data, particularly the scarcity of data with measurement values, has motivated the increased use of self-supervised learning approaches, such as contrastive learning [63] and pretraining [44,45].These methods are commonly employed in scenarios where labeled data is scarce while large amounts of unlabeled structural data are available.However, these techniques remain challenging for cyclic peptides given a more limited availability compared to small molecules.Apart from these techniques, data augmentation (such as oversampling and data warping) has been commonly used in the image processing field to increase training efficiency when the data are insufficient.
The augmented data represent a more comprehensive set of possible data points that minimizes the distance between the training and any future testing sets and reduces the risk of overfitting [64].We used three augmentation methods to generate 60 replicas based on the properties of SMILES, the nature of cyclic peptide sequences, and the complexity of cyclic peptide conformational changes to improve the learning efficiency of the model.First, the SMILES enumeration technique [65] was used to permute the atom order and generate input for the atom model with a different ordering.Subsequently, the input of the monomer model was rearranged using sequence arrangement considering the circularity of the cyclic peptide-the aligned monomer descriptors were augmented by the combination of sequence translation and rotation (change the start point of sequence) as shown in Fig.

Baseline methods
We validated the performance of CycPeptMP based on comparisons with seven baseline methods.
• Three traditional baselines: We constructed an RF model with 2048-bit Morgan FP, a support vector machine (SVM) model with seven 2D peptide descriptors, and an SVM model with 16 2D and 3D peptide descriptors to represent traditional cyclic peptide membrane permeability prediction methods.The hyperparameters of the RF and SVM models were determined by a grid search (Supplementary Table S2).• Two transformer-based methods: MAT [50] and SAT [47] were compared as state-of-the-art transformer-based methods for predicting small-molecule properties.MAT augments the transformer's self-attention mechanism with domainspecific knowledge, incorporating inter-atomic distances and molecular graph structure into the attention calculation to capture structural information.SAT focuses on the problem of traditional transformers in that positional encoding does not necessarily capture the structural similarity between nodes.It proposes a structure-aware transformer that incorporates structural information into self-attention by extracting a subgraph representation rooted at each node before computing the attention.• Two multi-level feature methods: PharmHGT [49] designs features on the atom and fragment levels and constructs a heterogeneous graph considering the correspondence between  atoms and fragments for a transformer-based model.FinGAT [ 41] uses a GAT model to extract atom-level information and combines it with Morgan FP to capture the molecular structure from multiple perspectives.
The hyperparameters of four DL-based models were determined by 50 trials using Optuna software based on the average RMSE of three runs (Supplementary Table S3).

Performance comparison for the test set
The prediction accuracy results for the test set are shown in Table 3 (the prediction accuracy and results for the validation set are shown for reference purposes in Supplementary Table S4 and Supplementary Fig. S4).CycPeptMP ranked first in all evaluation metrics, ref lecting a significant improvement in prediction performance over all existing methods (MAE = 0.355, Fig. 5).Considering the structural diversity of the test set, CycPeptMP showed good generalization performance and could learn the complex structures of cyclic peptides, which is difficult to apply pre-training through augmentation.The RF model constructed based on Morgan FP showed good prediction performance and ranked third among all methods (MAE = 0.485, Supplementary Fig. S5 (A)).SVM with 2D peptide descriptors (MAE = 0.488, Supplementary Fig. S5 (B)) had lower prediction accuracy than the RF model, whereas SVM with 3D descriptors improved prediction accuracy, making it superior to the RF model for the test set (MAE = 0.418, Supplementary Fig. S5  (C)).Cyclic peptide membrane permeation by passive diffusion negatively correlated with molecule size.SVM could partially predict permeability by using lipophilicity descriptors, such as LogP, which are largely dependent on molecular weight.CycPeptMP effectively combined Morgan FP and 16 2D and 3D peptide descriptors as peptide-level information to comprehensively characterize peptide structures from a topological and physicochemical perspective, leading to an improvement in prediction capabilities.Neither graph representation transformerbased MAT (MAE = 0.538, Supplementary Fig. S5 (D)) nor SAT (MAE = 0.690, Supplementary Fig. S5 (E)) could predict membrane permeability.Although MAT and SAT are state-of-the-art methods for predicting small-molecule properties, they could not effectively learn the structures of more complex cyclic peptides since they only utilize the atom-level information without augmentation technique.In addition to atom-level information, PharmHGT with fragment MACCS Keys ( MAE = 0.485, Supplementary Fig. S5 (F)) and FinGAT with molecular Morgan FP (MAE = 0.493, Supplementary Fig. S5 (G)) had significantly improved prediction accuracies compared to MAT and SAT, with the same level of accuracy as the RF model and 2D SVM.Hence, designing features from various perspectives may be key to successfully predicting the membrane permeability of cyclic peptides.These findings indicated that CycPeptMP effectively employed three levels of features to capture a wide range of structural information from the smallest atomic detail to the broader peptide-level conformation.
Meanwhile, different experimental conditions can significantly alter the measurements.CycPeptMPDB records all reported values from different literature assays for the same peptide (this study used values from the most recent literature).For example, cyclosporin A is a peptide with PAMPA measurements reported from five literature sources with permeabilities of −5.01, −6.2, −6.15, −5.71, and −5.72 (max: −5.01, min: −6.2, std: 0.427) in chronological order of publication; 1NMe3 is a peptide with PAMPA measurements reported from six literature sources with permeabilities of −4.5, −4.4,−6, −6.24, −6.4, and −5.52 (max: −4.4,min: −6.4,std: 0.798) in chronological order of publication.Since these errors are already present in the measurement experiment, the prediction accuracy MAE = 0.355 of CycPeptMP may be close to the limit of prediction.The predicting results of other assays using the model trained with PAMPA are shown in Supplementary Table S5.

Lower limit processing of permeability
We rounded the permeability with −10 ≤ LogP exp < −8 to −8 because the detection limit for most literature is −8.Among them, most peptides were recorded in CycPeptMPDB with LogP exp = −10.Most were not measured as −10 and were set to −10 by CycPeptM-PDB as there was no clear value.Therefore, their membrane permeability was unreliable.To discuss the effect of these data, we calculated the accuracy of 12 peptides with LogP exp = −10, 6 peptides with −10 < LogP exp ≤ −8, and 326 other peptides with −8 < LogP exp of the test set, respectively.Peptides with LogP exp = −10 could not be predicted by any method (MAE = 0.766 to 1.239, Supplementary Fig. S5).We have included these unreliable experimental values in our data to incorporate as much data as possible; however, it may be more appropriate to eliminate

Performance comparison of 10-fold cross-validation
Considering generalization performance, the Kennard-Stone algorithm was employed to maximize the distance between data points in the chemical space of the test set to extract the most diverse test set possible from the CycPeptMPDB data.For multiple random sampling evaluation, we performed a new 10-fold cross-validation with different random seeds for each run without altering the determined hyperparameters.As shown in Table 4, CycPeptMP consistently demonstrated the highest prediction performance for the difficult-to-predict test set (MAE = 0.355) and 10-fold cross-validation (MAE = 0.352).All baseline methods had higher accuracy for the 10-fold crossvalidation than the test set (the prediction performance of each model was relatively the same as the original validation set shown in Supplementary Table S4).

Ablation study of atom and monomer models
We conducted ablation studies on atom and monomer models with complex architectures (Fig. 6; the validation results are shown in Supplementary Fig. S6 (A)).For the atom model, A is the original model, and A-aug is the result without data augmentation.We measured the prediction accuracy when not using the Bond matrix (A-bond), using ordinary absolute positional encoding [66] instead of the Bond matrix (A-abpe), retaining only the Conf block (A-graph), or retaining only the graph block (A-conf).As shown in Fig. 6, the prediction accuracy of the atom model significantly improved by augmentation (A: 0.454, A-aug: 0.733).Regarding the architectural changes of the atom model, the original A showed the highest prediction accuracy, while the deletion of any element decreased the prediction accuracy.The relationship between atoms was captured more effectively using Bond (A: 0.454) than absolute positional encoding (A-abpe: 0.471), and the impact of removing Graph block (A-graph: 0.466) was greater than that of removing Conf block (A-conf: 0.455).
For the monomer model, M is the original model and Maug is the result without data augmentation.We also measured the accuracy change when replacing the general 1D-CNN layers with CyclicConv layers (M-Cyclic).Similar to the atom model results, the prediction accuracy of the monomer model significantly improved by augmentation (M: 0.405, M-aug: 0.658).These results showed that SMILES enumeration for the atom model and sequence arrangement for the monomer model effectively improved learning efficiency.Moreover, the augmentation technique is essential for learning the complex structure of cyclic peptides.Additionally, the 1D-CNN layer (M: 0.405) was superior to the CyclicConv layer (M-Cyclic: 0.448), consistent with previous findings [38].

Ablation study of the fusion model
The ablation study for the fusion model measured the inf luences of the number of replicas generated by augmentation and changes in architecture.Figure 7 (A) shows the accuracy based on 1 (no augmentation), 5, 10, 20, 30, 40, 50, and 60 (CycPeptMP) replicas per peptide.We observed a significant improvement in prediction accuracy compared to that without augmentation (F-1: 0.456) even with five replicas (F-5: 0.394).However, over 20 replicas showed approximately the same prediction accuracy as the amount of training data increased.This may be due to the limitations of increased diversity caused by merely reordering the inputs and the lack of diversity in the generated conformations (prediction results of the test set using regenerated conformations are shown in Supplementary Table S6).In Fig. 7 (B), F is the original CycPeptMP model; F-aux is the model without auxiliary loss; F-atom, F-mono, and F-pep represent the models lacking the respective sub-models; F-3D is the model that did not use all 3D information (Conf and 3D descriptors); and F-ensem represents the model with each submodel allowed to directly predict membrane permeability and the average ensemble of three predictions was taken.Auxiliary loss improved prediction accuracy (F-aux: 0.366).Furthermore, prediction accuracy decreased when any of the three sub-models were removed, indicating that the three levels of information were important to predicting membrane permeability.The peptide model had the greatest inf luence (F-pep: 0.388), followed by the monomer (F-mono: 0.387) and atom model (F-atom: 0.368).The use of 3D information insignificantly improved prediction accuracy (F-3D: 0.38).Accuracy may be improved by generating conformations using a more rigorous method, such as MD simulations.Correctly addressing the possible conformational distribution of peptides appears important.A detailed discussion of closed-conformation for the RDKit-generated conformations is presented in Supplementary Fig. S7.Finally, the average prediction accuracy further decreased when using a sub-model ensemble (Fensem: 0.385).Hence, it was better to extract latent features than having each sub-model directly predict permeability.

Comparison with MD-based method
Cyclic peptides tend to exist in various conformations, resulting in slow conformational transitions relative to simulation time scales.The first MD-based large-scale prediction of cyclic peptide membrane permeability used steered MD and replica-exchange umbrella sampling to accelerate sampling and simulated the membrane permeation process of 100 six-residue and 56 eightresidue peptides through a lipid bilayer [30].We compared their prediction results with the CycPeptMP results for 23 peptides (Supplementary Table S7) included in three validation sets (16 peptides) and the test set (7 peptides).
While the MD-based method could not successfully predict the membrane permeability of these 23 peptides (MAE = 1.521),CycPeptMP accurately predicted them all (MAE = 0.107) (Fig. 8).Hydrophobic cyclic peptides have insufficient solubility, slowly diffuse in the unstirred water layer, and are likely adsorbed to the membrane.Therefore, these behaviors could not be reproduced using the inhomogeneous solubility-diffusion model (ISDM), which only considers direct membrane permeation processes [30].They reported a prediction accuracy (R) of only 0.21 for all 100 sixresidue peptides; however, the accuracy increased to 0.54 when 33 hydrophobic peptides (AlogP ≥ 4) were excluded.A similar trend was observed among the 23 peptides compared in this study: the

Conclusion
CycPeptMP represents a high-performance deep learning-based technique for predicting the membrane permeability of cyclic peptides.It incorporates atom-, monomer-, and peptide-level features and improves training efficiency through three types of data augmentation techniques.CycPeptMP exhibits excellent prediction accuracy and generalization performance compared to existing methods.Moreover, we confirmed that CycPeptMP accurately predicts the permeability of peptides with much lower computational costs where MD-based methods fail.With its ability to rapidly identify high-permeability peptides, CycPeptMP has the potential to significantly advance cyclic peptide drug discovery.It also paves the way for the development of more effective DLbased techniques in related fields.Future studies should focus on improving the prediction performance by generating 3D conformations with a more rigorous method.

Key Points
• This study presents CycPeptMP, a novel DL-based method to predict the membrane permeability of cyclic peptides.CycPeptMP utilizes a multi-level feature design and data augmentation to simplify the characterization of complex peptide structures and improve model performance.• CycPeptMP achieves excellent performance using a test set containing various structures, demonstrating the functionality gained by implementing the three-level feature-appropriate architectural design.• CycPeptMP effectively determines the permeability of peptides that is difficult to predict using MD-based methods and can promote the efficacy of cyclic peptide drug discovery in myriad research directions, such as structure-activity relationship analysis and lead optimization.

Figure 1 .
Figure 1.Experimental data distribution.(A) Logarithm of experimentally determined membrane permeability (LogP exp ).(B) Molecular weight.Valid-1 is the dataset used for the first-time evaluation of the validation set; the corresponding training data sets are Train, Valid-2, and Valid-3.

Figure 2 .
Figure 2. Overall framework of the CycPeptMP model.The model incorporates the transformer-based atom, convolutional neural network (CNN)-based monomer, and MLP-based peptide sub-models.The three-level expression vectors extracted using the three sub-models are concatenated and passed through a shared layer to derive the final permeability prediction value.

Table 2 .
Figure 3. (A) Atom model architecture.Node features and three types of node-pair relative relationship matrices were used as input for the transformerbased model.(B) Architecture of the monomer model.Monomer descriptors were aligned based on the sequence information and used as input for the CNN-based model.

Figure 4 .
Figure 4. Sequence arrangement in the monomer model.The aligned monomer descriptors were translated and rotated based on the sequence information.

Figure 5 .
Figure 5. CycPeptMP prediction results for the test set.The predicted value of the test set is the average value of three runs.

Figure 6 .
Figure 6.Ablation results (MAE) for the atom and monomer models using the test set.

Figure 7 .
Figure 7. Ablation results (MAE) for the fusion model using the test set.(A) Different numbers of input replicas.(B) Different architectures.

Figure 8 .
Figure 8. (A) Prediction results of the MD-based method.Black dots represent hydrophobic peptides with AlogP ≥ 4; green dots represent the remaining peptides with AlogP < 4. (B) Prediction results of CycPeptMP.

Table 1 .
Selected descriptors, arranged in order of importance Mordred Van der Waals surface area using EState indices and surface area contribution density MOE Molecular mass density (molecular weight divided by approximated van der Waals volume) MolLogP RDKit Wildman-Crippen LogP value fr_Al_OH RDKit Number of aliphatic hydroxyl groups logP(o/w) MOE Log of the octanol/water partition coefficient lip_violation MOE Number of violations of Lipinski's Rule of Five h_logD MOE Log of the octanol/water distribution coefficient at pH 7 3D dens MOE Molecular mass density (molecular weight divided by 3D van der Waals volume) FNSA4 Mordred Fractional charged partial negative surface area (version 4) RNCS Mordred Relative negative charge surface area FASA-MOE Fractional water accessible surface area of all atoms with negative partial charge FCASA+ MOE Fractional positive charge weighted surface area FAsa:P MOE Fractional water accessible surface area of all polar atoms FNSA2 Mordred Fractional charged partial negative surface area (version 2) where d model is the attention dimension of the atom model, W node ∈ R N node−features ×d model and W bond ∈ R Natoms×d model are trainable parameters, and x ∈ R Natoms×d model is the updated input for the encoder block.Two types of distance matrices, Graph and Conf , i.e. the shortest pairwise graph distance and 3D Euclidean distance of each atom, were used to capture the peptide's overall structure and 3D conformation.The distance maps were processed through an attenuation function to weaken distant interactions as follows: 4. Finally, considering the complex conformational changes during membrane permeation of cyclic peptides, 60 different conformations per peptide/monomer were generated using RDKit to incorporate more diverse 3D information into the model.Cyclic peptide conformations were used to calculate the Conf matrix for the atom model and peptide descriptors for the peptide model.Monomer conformations were used to calculate the monomer descriptors for the monomer model.Introducing variations of the input data enables our model to become more robust to cyclic peptide conformational f lexibility and allows it to partially consider circularity.Data augmentation effectively increases the size of the training set, leading to more efficient and stable training.During training, each replica was given the same label and treated as independent data.During inference, relying on a single conformation could introduce bias, considering the conformational f lexibility of cyclic peptides.Therefore, the average of 60 replicas was used as the final predicted value to represent the conformational ensemble of peptides.

Table 3 .
Performance comparison between seven baseline methods and CycPeptMP using the test set.The metrics are the averaged values of three repeated runs; the best result for each metric is indicated in bold

Table 4 .
Performance comparison between seven baseline methods and CycPeptMP by 10-fold cross-validation.The metrics are the averaged values of ten repeated runs; the best result for each metric is indicated in bold